-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26533][SQL] Support query auto timeout cancel on thriftserver #29933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #129347 has finished for PR 29933 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129350 has finished for PR 29933 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #129377 has finished for PR 29933 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
|
||
| if (queryTimeout > 0) { | ||
| Executors.newSingleThreadScheduledExecutor.schedule(new Runnable { | ||
| override def run(): Unit = cancel(OperationState.TIMEDOUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean the log (that I added in L206) before starting the thread? Looks cancel itself already has a log in https://github.com/apache/spark/pull/29933/files#diff-72dcd8f81a51c8a815159fdf0332acdcR343 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only know from the log in cancel that the query is canceled, but don't know it is timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to let user know it is canceled because of timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
Test build #129387 has finished for PR 29933 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
cc: @dongjoon-hyun |
|
Test build #129388 has finished for PR 29933 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu . -Phive-1.2 is removed finally. Please update this PR.
|
Sure, thanks, @dongjoon-hyun |
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129775 has finished for PR 29933 at commit
|
|
Anyone could check this? |
|
retest this please |
|
Kubernetes integration test starting |
|
Test build #130032 has finished for PR 29933 at commit
|
|
retest this please |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #130034 has finished for PR 29933 at commit
|
|
kindly ping. |
|
LGTM |
|
Thanks, all~, Merged to master. |
…ask on thriftserver ### What changes were proposed in this pull request? This PR add a new config `spark.sql.thriftServer.forceCancel` to give user a way to interrupt task when cancel statement. ### Why are the changes needed? After [#29933](#29933), we support cancel query if timeout, but the default behavior of `SparkContext.cancelJobGroups` won't interrupt task and just let task finish by itself. In some case it's dangerous, e.g., data skew or exists a heavily shuffle. A task will hold in a long time after do cancel and the resource will not release. ### Does this PR introduce _any_ user-facing change? Yes, a new config. ### How was this patch tested? Add test. Closes #30481 from ulysses-you/SPARK-33526. Lead-authored-by: ulysses-you <[email protected]> Co-authored-by: ulysses-you <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
 [](https://github.com/yaooqinn/kyuubi/pull/451)       [<img width="16" alt="Powered by Pull Request Badge" src="https://user-images.githubusercontent.com/1393946/111216524-d2bb8e00-85d4-11eb-821b-ed4c00989c02.png">](https://pullrequestbadge.com/?utm_medium=github&utm_source=yaooqinn&utm_campaign=badge_info)<!-- PR-BADGE: PLEASE DO NOT REMOVE THIS COMMENT --> <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/yaooqinn/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. --> Manual cherry-pick some Spark patch into Kyuubi. 1. [Support query auto timeout cancel on thriftserver](apache/spark#29933) 2. [Add config to control if cancel invoke interrupt task on thriftserver](apache/spark#30481) In order to keep backward with early Spark version, we hard code the config key instead of refer to Spark SQLConf. Note that, the exists timeout of operator (`kyuubi.operation.idle.timeout`) is to cancel that client has no access with engine. That said if a query run a long time and the client is alive, the query would not be cancelled. Then the new added config `spark.sql.thriftServer.queryTimeout` can handle this case. ### _How was this patch tested?_ Add new test. Closes #451 from ulysses-you/query-timeout. 212f579 [ulysses-you] docs 9206538 [ulysses-you] empty flaky test ddab9bf [ulysses-you] flaty test 1da02a0 [ulysses-you] flaty test edfadf1 [ulysses-you] nit 3f9920b [ulysses-you] address comment 9492c48 [ulysses-you] correct timeout 5df997e [ulysses-you] nit 2124952 [ulysses-you] address comment 192fdcc [ulysses-you] fix tets d684af6 [ulysses-you] global config 1d1adda [ulysses-you] empty 967a63e [ulysses-you] correct import 128948e [ulysses-you] add session conf in session 144d51b [ulysses-you] fix a90248b [ulysses-you] unused import c90386f [ulysses-you] timeout move to operation manager d780965 [ulysses-you] update docs a5f7138 [ulysses-you] fix test f7c7308 [ulysses-you] config name 7f3fb3d [ulysses-you] change conf place 97a011e [ulysses-you] unnecessary change 0953a76 [ulysses-you] move test 38ac0c0 [ulysses-you] Merge branch 'master' of https://github.com/yaooqinn/kyuubi into query-timeout 71bea97 [ulysses-you] refector implementation 35ef6f9 [ulysses-you] update conf 0cad8e2 [ulysses-you] Support query auto timeout cancel on thriftserver Authored-by: ulysses-you <[email protected]> Signed-off-by: Kent Yao <[email protected]>
 [](https://github.com/yaooqinn/kyuubi/pull/451)       [<img width="16" alt="Powered by Pull Request Badge" src="https://user-images.githubusercontent.com/1393946/111216524-d2bb8e00-85d4-11eb-821b-ed4c00989c02.png">](https://pullrequestbadge.com/?utm_medium=github&utm_source=yaooqinn&utm_campaign=badge_info)<!-- PR-BADGE: PLEASE DO NOT REMOVE THIS COMMENT --> <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/yaooqinn/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. --> Manual cherry-pick some Spark patch into Kyuubi. 1. [Support query auto timeout cancel on thriftserver](apache/spark#29933) 2. [Add config to control if cancel invoke interrupt task on thriftserver](apache/spark#30481) In order to keep backward with early Spark version, we hard code the config key instead of refer to Spark SQLConf. Note that, the exists timeout of operator (`kyuubi.operation.idle.timeout`) is to cancel that client has no access with engine. That said if a query run a long time and the client is alive, the query would not be cancelled. Then the new added config `spark.sql.thriftServer.queryTimeout` can handle this case. ### _How was this patch tested?_ Add new test. Closes #451 from ulysses-you/query-timeout. 212f579 [ulysses-you] docs 9206538 [ulysses-you] empty flaky test ddab9bf [ulysses-you] flaty test 1da02a0 [ulysses-you] flaty test edfadf1 [ulysses-you] nit 3f9920b [ulysses-you] address comment 9492c48 [ulysses-you] correct timeout 5df997e [ulysses-you] nit 2124952 [ulysses-you] address comment 192fdcc [ulysses-you] fix tets d684af6 [ulysses-you] global config 1d1adda [ulysses-you] empty 967a63e [ulysses-you] correct import 128948e [ulysses-you] add session conf in session 144d51b [ulysses-you] fix a90248b [ulysses-you] unused import c90386f [ulysses-you] timeout move to operation manager d780965 [ulysses-you] update docs a5f7138 [ulysses-you] fix test f7c7308 [ulysses-you] config name 7f3fb3d [ulysses-you] change conf place 97a011e [ulysses-you] unnecessary change 0953a76 [ulysses-you] move test 38ac0c0 [ulysses-you] Merge branch 'master' of https://github.com/yaooqinn/kyuubi into query-timeout 71bea97 [ulysses-you] refector implementation 35ef6f9 [ulysses-you] update conf 0cad8e2 [ulysses-you] Support query auto timeout cancel on thriftserver Authored-by: ulysses-you <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit fecdba3) Signed-off-by: Kent Yao <[email protected]>
What changes were proposed in this pull request?
Support query auto cancelling when running too long on thriftserver.
This is the rework of #28991 and the credit should be the original author, @leoluan2009.
Closes #28991
Why are the changes needed?
For some cases, we use thriftserver as long-running applications.
Some times we want all the query need not to run more than given time.
In these cases, we can enable auto cancel for time-consumed query.Which can let us release resources for other queries to run.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added tests.